Pii: S0031-3203(01)00167-4

نویسندگان

C. Strouthopoulos

N. Papamarkos

A. E. Atsalakis

چکیده

Text extraction in mixed-type documents is a pre-processing and necessary stage for many document applications. In mixed-type color documents, text, drawings and graphics appear with millions of di0erent colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method to automatically detect and extract text in mixed-type color documents is presented. The proposed method is based on a combination of an adaptive color reduction (ACR) technique and a page layout analysis (PLA) approach. The ACR technique is used to obtain the optimal number of colors and to convert the document into the principal of them. Then, using the principal colors, the document image is split into the separable color plains. Thus, binary images are obtained, each one corresponding to a principal color. The PLA technique is applied independently to each of the color plains and identi4es the text regions. A merging procedure is applied in the 4nal stage to merge the text regions derived from the color plains and to produce the 4nal document. Several experimental and comparative results, exhibiting the performance of the proposed technique, are also presented. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive lifting for shape-based image retrieval

We propose to use adaptive wavelet lifting for image retrieval systems that are based on shape detection and multiresolution structures of objects in a database against a background of texture. To measure the performance of our approach, feature vectors are computed based on moment invariants of detail coe1cients produced by the adaptive lifting scheme and retrieval rates are obtained by measur...

متن کامل

On the use of Bernoulli Mixture Models for Text Classification

متن کامل

Application of planar shape comparison to object retrieval in image databases

متن کامل

Gaussian mixture parameter estimation with known means and unknown class-dependent variances

متن کامل

Robust voxel similarity metrics for the registration of dissimilar single and multimodal images

In this paper, we develop data driven registration algorithms, relying on pixel similarity metrics, that enable an accurate (subpixel) rigid registration of dissimilar single or multimodal 2D/3D images. Gross dissimilarities are handled by considering similarity measures related to robust M-estimators. In particular, a novel (robust) similarity metric is proposed for comparing multimodal images...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Pii: S0031-3203(01)00167-4

نویسندگان

چکیده

منابع مشابه

Adaptive lifting for shape-based image retrieval

On the use of Bernoulli Mixture Models for Text Classification

Application of planar shape comparison to object retrieval in image databases

Gaussian mixture parameter estimation with known means and unknown class-dependent variances

Robust voxel similarity metrics for the registration of dissimilar single and multimodal images

عنوان ژورنال:

اشتراک گذاری